Using the LARA platform to crowdsource a multilingual, multimodal Little Prince

نویسندگان

چکیده

We describe an ongoing project, in which informally organised international consortium is using the open source LARA platform to create multimodal annotated editions of Antoine de Saint-Exupéry’s Le petit prince multiple languages, so far French, English, Italian, Icelandic, Irish, Japanese, Polish, Farsi and Mandarin. versions book include integrated audio translations automatically generated lemma-based concordance, are freely available online. methods used construct various versions. In some cases, work for a given language was simply divided by type, typically with one person adding another recording audio. other we experimented crowdsourcing methods, splitting text into chapter-sized units distribute these annotators, then combining results at end. Finally, report initial classroom study, where French version intermediate-level Australian students French.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

To Crowdsource or Not To Crowdsource?

Crowdsourcing contests—events to solicit solutions to problems via an open-call format for prizes—have gained ground as a mechanism for organizations to accomplish tasks. This paper uses game-theoretic models to develop design principles for crowdsourcing contests and answer the questions: what types of tasks should be crowdsourced? Under what circumstances? When a single task is to be complete...

متن کامل

Annotating the Little Prince with Chinese AMRs

Abstract Meaning Representation (AMR) is an annotation framework in which the meaning of a full sentence is represented as a rooted, acyclic, directed graph. In this paper, we describe a pilot project in whichMeaning Representation (AMR) is an annotation framework in which the meaning of a full sentence is represented as a rooted, acyclic, directed graph. In this paper, we describe a pilot proj...

متن کامل

Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic

Arabic is a language with great dialectal variety, with Modern Standard Arabic (MSA) being the only standardized dialect. Spoken Arabic is characterized by frequent code-switching between MSA and Dialectal Arabic (DA). DA varieties are typically differentiated by region, but despite their wide-spread usage, they are under-resourced and lack viable corpora and tools necessary for speech recognit...

متن کامل

Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications

U-Compare is a UIMA-based workflow construction platform for building natural language processing (NLP) applications from heterogeneous language resources (LRs), without the need for programming skills. U-Compare has been adopted within the context of the METANET Network of Excellence, and over 40 LRs that process 15 European languages have been added to the U-Compare component library. In line...

متن کامل

Multilingual Multimodal Language Processing Using Neural Networks

We live in an increasingly multilingual multimodal world where it is common to find multiple views of the same entity across modalities and languages. For example, news articles which get published in multiple languages are essentially different views of the same entity. Similarly, video, audio and multilingual subtitles are multiple views of the same movie clip. Given the proliferation of such...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Beyond Philology

سال: 2022

ISSN: ['2451-1498', '1732-1220']

DOI: https://doi.org/10.26881/bp.2022.1.09